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Abstract 

We consider perpetuities of the form 

D = B\ exp (Fi) + B 2 exp (Y 1 + Y 2 ) + 

where the Yj's and Bj's might be i.i.d. or jointly driven by a suitable Markov chain. 
We assume that the Y^-'s satisfy the so-called Cramer condition with associated root 
6* G (0,oo) and that the tails of the Bj's are appropriately behaved so that D is 
regularly varying with index 6*. We illustrate by means of an example that the natu- 
ral state-independent importance sampling estimator obtained by exponentially tilting 
the Kj's according to 9* fails to provide an efficient estimator (in the sense of appro- 
priately controlling the relative mean squared error as the tail probability of interest 
gets smaller). Then, we construct estimators based on state-dependent importance 
sampling that are rigorously shown to be efficient. 



1 Introduction 



We consider the problem of developing efficient rare-event simulation methodology for com- 
puting the tail of a perpetuity (also known as infinite horizon discounted reward). Perpetu- 
ities arise in the context of ruin problems with investments and in the study of financial time 
series such as ARCH-type processes (see for example, Embrechts et al. (1997) and Nyrhinen 
(2001)). 

In the sequel we let X = (X n : n > 0) be an irreducible finite state-space Markov chain 
(see Section 2 for precise definitions). In addition, let ((£ n , r] n ) : n > 1) be a sequence 
of i.i.d. (independent and identically distributed) two dimensional r.v.'s (random variables) 
independent of the process X. Given X Q = xq and D = d Q the associated (suitably scaled 
by a parameter A > 0) discounted reward at time n takes the form 

D n (A) = d + X (Xi, m ) A exp (Si) + A (X 2 , r] 2 ) A exp (S 2 ) 
+ ... + A (X n , 7} n ) A exp (S n ) 
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where the accumulated rate process (S k '■ k > 0) satisfies 

Sjfc+i = S k + j (X k+1 , £k+i) > 

given an initial value Sq = Sq. In order to make the notation compact, throughout the rest 
of the paper we shall often omit the explicit dependence of A in D n (A) and we will simply 
write D n . We stress that A > has been introduced as a scaling parameter which eventually 
will be sent to zero. Introducing A, as we shall see, will be helpful in the development of 
the state-dependent importance sampling algorithm that we study here. 

The functions (7 ( G 5,z 6 I) and (A (x, z) : x G S, z G M) are deterministic and 

represent the discount and reward rates respectively. For simplicity we shall assume that 
A (•) is non-negative. Define 

4>( so ,do,xo) (A) = -P(Ax, > l\S = s ,D = d ,X = x ) 

= P(T A < oo\S = s , D = d ,X = x Q ) , (1) 

where T A = inf{n > : D n (A) > 1}. 

Throughout this paper the distributions of X(x, rji) and 7(0;, £1) are assumed to be known 
both analytically and via simulation, as well as the transition probability of the Markov chain 
Xj. Our main focus on this paper is on the efficient estimation via Monte Carlo simulation 
of <p (A) = 0(q,o,xo) (A) as A \ under the so-called Cramer condition (to be reviewed in 
Section 2) which in particular implies (see Theorem [1] below) 

0(A) = c*A e *(l + o(l)) (2) 

for a given pair of constants c*, 9* G (0, 00). Note that 

^ ( a ) = p u£ ex p ^ x ^ > ' 

so A corresponds to the inverse of the tail parameter of interest. 

Although our results will be obtained for so = = do, it is convenient to introduce the slightly 
more general notation in (TjQ) to deal with the analysis of the state-dependent algorithms that 
we will introduce. 

Approximation (|2j) is consistent with well known results in the literature (e.g. Goldie (1991)) 
and it implies a polynomial rate of decay to zero, in 1/A, for the tail of the distribution of 
the perpetuity X]fcli ex P (Sk) A (X k ,r] k ). The construction of our efficient Monte Carlo pro- 
cedures is based on importance sampling, which is a variance reduction technique popular 
in rare-event simulation (see, for instance, Asmussen and Glynn (2008)). It is important 
to emphasize that, since our algorithms are based on importance sampling, they allow to 
efficiently estimate conditional expectations of functions of the sample path of {D n } given 
that Ta < 00. The computational complexity analysis of the estimation of such conditional 
expectations is relatively straightforward given the analysis of an importance sampling al- 
gorithm based on 0(A) (see for instance the discussion in Adler, Blanchet and Liu (2010)). 
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Therefore, as it is customary in rare-event simulation, we concentrate solely on the algorith- 
mic analysis of a class of estimators for (A). 

Asymptotic approximations related to (j2J) go back to Kesten (1973) who studied a suitable 
multidimensional analogue of D^. In the one-dimensional setting a key reference is Goldie 
(1991). Under i.i.d. assumptions, he gave an expression for the constant c* which is only 
explicit if 6* is an integer. More recent work on this type of asymptotics was conducted 
by Benoite de Saporta (2005), who assumed that the interest rate process {A(X n ,?7 n )} itself 
forms a finite state Markov chain, and by Enriquez et. al (2009), who obtained a different 
representation for c* if the Xi are i.i.d. (we will refer to this important special case as the 
i.i.d. case). Collamore (2009) studied the case when the sequence {X(X n ,r] n )} is i.i.d. (not 
dependent on X n ) and {7(A n ,£ n )} is modulated by a Harris recurrent Markov chain {X n }. 
Since in our case we need certain Markovian assumptions that apparently have not been 
considered in the literature, at least in the form that we do here, in Section 2, we establish 
an asymptotic result of the form (T5]) that fits our framework. 

Our algorithms allow to efficiently compute (A) with arbitrary degree of precision, in 
contrast to the error implied by asymptotic approximations such as (J2J). In particular, 
our algorithms can be used to efficiently evaluate the constant c*, whose value is actually 
of importance in the statistical theory of ARCH processes (see for example, Chapter 8 of 
Embrechts et al. (1997)). 

The efficiency of our simulation algorithms is tested according to widely applied criteria in the 
context of rare event simulation. These efficiency criteria requires the relative mean squared 
error of the associated estimator to be appropriately controlled (see for instance the text of 
Asmussen and Glynn (2008)). Let us recall some basic notions on these criteria in rare-event 
simulation. An unbiased estimator is said to be strongly efficient if E(Z\) = 0(0 (A) 2 ). 
The estimator is said to be asymptotically optimal if E (Z\) < 0(0 (A) 2_e ) for every e > 0. 
Jensen's inequality yields EZ\ > (A) 2 , so asymptotic optimality requires the best possible 
rate of decay for the second moment, and hence the variance, of the underlying estimator. 
Despite being a weaker criterion than strong efficiency, asymptotic optimality is perhaps the 
most popular efficiency criterion in the rare-event simulation literature given its convenience 
yet sufficiency to capture the rate of decay. 

We shall design both strongly efficient and asymptotically optimal estimators and explain 
the advantages and disadvantages behind each of them from an implementation standpoint. 
Some of these points of comparison relate to the infinite horizon nature of D^. We are 
interested in studying unbiased estimators. In addition, besides the efficiency criteria we 
just mentioned, at the end we are also interested in being able to estimate the overall 
running time of the algorithm and argue that the total computational cost scales graciously 
as A — > 0. Our main contributions are summarized as follows: 

1) The development of an asymptotically optimal state-dependent importance sampling es- 
timator for 0(A) (see Theorem [3]). The associated estimator is shown to be unbiased and 
the expected termination time of the algorithm is of order O (log(l/A) p ) for some p < oo 
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(see Proposition [2] in Section E]). 



2) An alternative, state-independent, estimator is also constructed which is strongly effi- 
cient, see Proposition |2j The state- independent estimator, however, often will have to be 
implemented incurring in some bias (which can be reduced by increasing the length of a 
simulation run). 

3) New proof techniques based on Lyapunov inequalities. Although Lyapunov inequalities 
have been introduced recently for the analysis of importance sampling estimators in Blanchet 
and Glynn (2008), the current setting demands a different approach for constructing the 
associated Lyapunov function given that the analysis of (A) involves both light-tailed and 
heavy-tailed features (see the discussion later this section; also see Proposition [1] in Section 

ED- 

4) A new class of counter-examples showing that applying a very natural state-independent 
importance sampling strategy can in fact lead to infinite variance (see Section f3 . 2 [) . This 
contribution adds to previous work by Glasserman and Kou (1995) and further motivates 
the advantages of state-dependent importance sampling. 

5) The development of an asymptotic result of the form (TS]) that may be of independent 
interest (see Theorem [Q. 

As we mentioned earlier, importance sampling is a variance reduction technique whose ap- 
propriate usage leads to an efficient estimator. It consists in sampling according to a suitable 
distribution in order to appropriately increase the frequency of the rare event of interest. 
The corresponding estimator is just the indicator function of the event of interest times 
the likelihood ratio between the nominal distribution and the sampling distribution evalu- 
ated at the observed outcome. The sampling distribution used to simulate is said to be the 
importance sampling distribution or the change-of- measure. Naturally, in order to design 
efficient estimators one has to mimic the behavior of the zero-variance change-of-measure, 
which coincides precisely with the conditional distribution of {D n } given > 1. Now, 
assume that Sq = = D . As is known in the literature on ARCH (it is actually made 
precise in Enriquez et. al. (2009)), the event Ta < oo is typically caused by the event 
Ea = {niax fc > Sk > log(l/A)}, which is the event that the additive process {Sk} hits a 
large value; see Section 2 for more discussion. In turn, the limiting conditional distribution 
of the underlying random variables given as A J, is well understood and strongly effi- 
cient estimators based on importance sampling have been developed for computing P(E^) 
(see Collamore (2002)). Surprisingly, as we will show by means of a simple example, directly 
applying the corresponding importance sampling estimator which is strongly efficient for 
P(Ea) can actually result in infinite variance for any A > when estimating (A) (see 
Section 3.2). 
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Given the issues raised in the previous paragraph, the development of efficient simulation 
estimators for computing (A) calls for techniques that go beyond the direct application of 
standard importance sampling estimators. In particular, our techniques are based on state- 
dependent importance sampling, which has been substantially studied in recent years (see 
for example, Dupuis and Wang (2004 and 2007), Blanchet and Glynn (2008), and Blanchet, 
Leder and Glynn (2009)). The work of Dupuis and Wang provides a criterion, based on a 
suitable non-linear partial differential inequality, in order to guarantee asymptotic optimality 
in light tailed settings. It is crucial for the development of Dupuis and Wang to have an 
exponential rate of decay in the parameter of interest (in our case 1/A). Blanchet and Glynn 
(2008) develop a technique based on Lyapunov inequalities that provides a criterion that can 
be used to prove asymptotic optimality or strong efficiency beyond the exponential decay rate 
setting. Such criterion, however, demands the construction of a suitable Lyapunov function 
whose nature varies depending on the type of large deviations environment considered (light 
vs heavy-tails). The construction of such Lyapunov functions has been studied in light and 
heavy-tailed environments (see for instance, Blanchet, Leder and Glynn (2009) and Blanchet 
and Glynn (2008)). 

The situation that we consider here is novel since it has both light and heavy-tailed features. 
On one hand, the large deviations behavior is caused by the event Ea, which involves light- 
tailed phenomena. On the other hand, the scaling of the probability of interest, namely, 
0(A) is not exponential but polynomial in 1/A (i.e. the tail of the underlying perpetuity 
is heavy-tailed, in particular, Pareto with index Consequently, the Lyapunov function 
required to apply the techniques in Blanchet and Glynn (2008) includes new features relative 
to what has been studied in the literature. 

Finally, we mention that while rare event simulation of risk processes has been considered in 
the literature (see for instance Asmussen (2000)), such simulation in the setting of potentially 
negative interest rates has been largely unexplored. A related paper is that of Asmussen and 
Nielsen (1995) in which deterministic interest rates are considered. A conference proceedings 
version of this paper (Blanchet and Zwart (2007), without proofs) considers the related 
problem of estimating the tail of a perpetuity with stochastic discounting, but the discounts 
are assumed to be i.i.d. and premiums and claims are deterministic. Finally, we also note 
a paper by Collamore (2002), who considered ruin probability of multidimensional random 
walks that are modulated by general state space Markov chains. 

During the second revision of this paper Collamore et al (2011) proposed an independent 
algorithm for the tail distribution of fixed point equations, which include perpetuities as a 
particular case. We shall discuss more about this algorithm in Section 7. 

The rest of the paper is organized as follows. In Section 2 we state our assumptions and review 
some large deviations results for 0(A). Section 3 focuses on state-independent sampling. 
The state-dependent sampling algorithm is developed in Section 4, and its efficiency analysis 
and cost-per-replication are studied in Sections 5 and 6. In Section 7 we include additional 
extensions and considerations. Section 8 illustrates our results with a numerical example. 
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2 Some Large Deviations for Perpetuities 



As discussed in the Introduction, we shall assume that the process (S n : n > 0) is a Markov 
random walk. As it is customary in the large deviations analysis of quantities related to 
these objects, we shall impose some assumptions in order to guarantee the existence of an 
asymptotic logarithmic moment generating function for S^. 

First we have the following assumption: 

Assumption 1: We assume that X is an irreducible Markov chain taking values in a finite 
state space S with transition matrix (K (x, y) : x,y e S). Moreover, we further assume that 
the £fc's and 7 (■) satisfy 

sup E exp (#7 (x, £1)) < 00, (3) 
where M is a neighborhood of the origin. 

If Assumption 1 is in force, the Perron-Frobenius theorem for positive and irreducible ma- 
trices guarantees the existence of (uo (2) : x G S,9 e Af) and exp (ip (#)) so that 

u e (x) = E x [exp (0 7 (X u &) - ^ (9)) u e (X 1 )]. (4) 

The function uq (•) is strictly positive and unique up to constant scalings. Indeed, to see how 
the Perron-Frobenius theorem is applied, define 

E exp (6*7 (x, f 1)) = exp (x (x, 6)) 

and note that (BJ is equivalent to the eigenvalue problem 

{QeUe) (x) = exp (9)) u e (x) , 

where Q e (x, y) = K (x, y) exp (x (y, 

We also impose the following assumption that is often known as Cramer's condition. 

Assumption 2: Suppose that there exists 9* > such that ip (0*) = 0. Moreover, assume 
that there exists 9 > 9* such that ip (9) < 00. 

In order to better understand the role of Assumption 2, it is useful to note that under 
Assumption 1, given X = x , r (x ) = inf{£; > 1 : X k = x } is finite almost surely and 
D = YlkLi ex P (*^ fc ) ^ i.^ki Vk) admits the decomposition 

D = B + exp (Y) D' , (5) 
where D' is identical in distribution to D, and 

Y = S t(xo) , B = A (Xj, r]j) exp(Sj), (6) 
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and D' is equal in distribution to D and independent of [B, S T ^ Xo u. In other words, D can be 
represented as a perpetuity with i.i.d. pairs of reward and discount rates. This decomposition 
will be invoked repeatedly during the course of our development. Now, as we shall see in the 
proof of Theorem [1] below, it follows that 6* > appearing in Assumption 2 is the Cramer 
root associated to Y, that is, 

Eexp(9 m Y) = l. (7) 

Note also that since the moment generating function of Y is convex, and > 0, we must 
have that EY < and therefore by regenerative theory we must have that i?7(X 00 ,^i) < 0. 

An additional final assumption is imposed in our development. 
Assumption 3: Assume that sup^.^ E x \ (Xi,T)i) a < oo for each a G (0, oo). 

The following examples are given to illustrate the flexibility of our framework. 

Example 1 ARCH sequences have been widely used in exchange rate and log-return models 
(see for example, Embrechts et al. (1997)). In these models the object of interest, A n , are the 
standard deviations of the log-return. The simplest case of ARCH sequences is the ARCH(l) 
process, which satisfies 

Typically, the Z n 's are i.i.d. standard Gaussian random variables, and «o > and a± < 
1. The stationary distribution is a perpetuity. We can directly work with the stationary 
distribution of this process or transform the problem into one with constant rewards (equal 
to a ) by noting that 

T n+ i = a + aiA 2 n+l = a + ai(a + aiA 2 n )Z 2 +1 = a + aiT n Z^ +l . 

We obtain that 

Too — a = B 1 exp (Y x ) + B 2 exp (Y 1 + Y 2 ) + ... 
where Bi = a and Yj = log {pt\Zj) for % > 1. Assumptions 1 to 3 are in place in this setting. 

Example 2 A changing economic environment can be modeled by say, a two-state Markov 
chain denoting good and bad economic states. We can then model the discounted return 
of a long-term investment under economic uncertainty as a perpetuity with this underlying 
Markov modulation. Denoting X{ G {good, bad}, % = 1,2,... as the Markov chain, our 
return can be represented as 

D = B 1 exp( Y i) + B 2 exp(Fi + Y 2 ) + • • • 

where Bi = \(Xi,r)i) and Y, = 7(Aj,£j) are the return and discount rate at time i, and rji 
and £j are i.i.d. r.v.'s denoting the individual random fluctuations. 

We now state a result that is useful to test the optimality of our algorithms. 
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Theorem 1 Under Assumptions 1 to 3, 

0(A) = c*A e *(l + o(l)) 

as A \ 0. 



The proof of Theorem [T] will be given momentarily, but first let us discuss the intuition behind 
the asymptotics described in the theorem. It is well known that under our assumptions 
exp(max{5fc : k > 0}) is regularly varying with index — 9* < (a brief argument indicating 
the main ideas behind this fact is given in Section 3.1). The principle of the largest jump 
in heavy-tailed analysis indicates that the large deviations behavior of D is dictated by 
a large jump of size b/A arising from the largest contribution in the sum of the terms 
defining D. Given that the reward rates are light-tailed, such contribution is likely caused 
by exp(max{Sfc : k > 0}), as made explicit by Enriquez et. al. (2009) in the i.i.d. case. 
Therefore, the large deviations behavior of D is most likely caused by the same mechanism 
that causes a large deviations behavior in exp(max{Sfc : k > 0}). This type of intuition will 
be useful in developing an efficient importance sampling scheme for estimating the tail of D. 



Proof of Theorem Q]. Note that equations §5§ and (E]) allow us to apply Theorem 4.1 from 
Goldie (1991). In particular, in order to apply Goldie's results we need to show that 



for some 6 > 6* and that 



Ex exp (9*S T ( Xo )) = 1, 
E XQ exp (9S t{xo) ) < oo, 



E X0 B a < oo 



(9) 



(10) 



for some a > 9* (conditions (jSJ), flSD an d (jTUJ) here correspond to conditions (2.3), (2.4) and 
(4.2) respectively in Goldie (1991)). First we show ©. Note that equation (j3j) implies that 
the process 

use* Ue * ("^ n ) fa c \ 
M n * = — - exp {9*S n ) 

u e , {xo) 

is a positive martingale. Therefore, we have that 

1 = E Xo M n " AT ^ 
= E Xo [exp (0*5V(>o)) H T ( x o) < n)] 



+ E X0 



exp (9*S n ) I(t (x ) > n) 



u 9t , (x Q ) 

By the monotone convergence theorem we have that 

E [exp (9*S t{xo )) I(t (x ) < n)] — > E exp (9*S r{xo) ) 

implies that the matrix (Kg t (x,y) : x,y e S) 
ue, (y)exp (x (y,9*)) 



as n /*■ oo. On the other hand note that 
defined via 



Ke, {x, y) = K (x, y) 



u e , (x) 
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is an irreducible stochastic matrix. Now let \^K et (x, y) : x,y G S \{xo}j be the submatrix of 

Kg t (■) that is obtained by removing the row and column corresponding to state xq. Observe 
that 

A "' exp (9*S n ) I(r (x ) > n) 



Kll)(x ) = E X0 

' L u e* ( x o) 

By irreducibility we have that Kg t (•), therefore 

~ue, (X n ) 



E 



exp (9*S n ) I(r (x ) > n) 



as n / oo, obtaining (jSJ). The bound (jHJ) follows easily by noting that 
E X0 [exp (9S t{xo) )] 

oo 

= E XQ [exp (9S k ) ; r (x ) > k - 1, X k = x ] 



k=l 

OO 



E Xo [exp (g^fc-i) if (X fc -i, x ) exp (x (0, x ));r (x ) > k- 1]. 



fc=i 



So, if we define, for i^ij 



w e (x) = exp (x (0, x )) if (x, x 



u e (x) 



and 



for x, y 7^ x we see that 



Re (x, I/) = exp (x (0, y)) if (x, y) 



ue (y) 
ue (x) 



E xo [exp (eSV^,,))] = (Rev g ) (x ) 



fe=i 

Note that Re, = Ke, is strictly substochastic. So, by continuity there exists 9 > 9* for which 
Re has a spectral radius which is strictly less than one and therefore ([9]) holds. Finally, we 
establish ( flOl) . Observe that 

B < r (xo) max A (Xk, Vk) ex P (max{Sfc : 1 < k < r (xo)}) . 

l<fc<r(a;o) 

Therefore, for 1/p+l/q+l/r — 1 and p, g, r > 1 we have that 

EB a < Et (xo) a max A (X k ,7] k ) a exp (max{a5fc : 1 < k < r (xo)}) 

l<k<r(xo) 

< (E xo r (x ) pa ) 1/p x (E X0 max A (X k , Vk ) qa ) 1/q 

l<k<r(xo) 

x (E X0 exp (max{raS k : 1 < k < r (x )})) 1/r . 
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Since E xo t (x ) pa + E XQ max 1 < J ,< T ( I() ) A (X k , r] k ) qa < oo for all p,q e (0, oo) it suffices to show 
that 



for some 9 > 9*. In order to do this define T (b) = inf{fc > : S k > b} and note that 



Bound (Q implies that S T r X0 ) decays at an exponential rate that is strictly larger than 9*. To 
analyze P X0 (T (6) < r (x )), let P x * (■) (respectively E e x * o (•)) be the probability measure (resp. 
the expectation operator) associated to the change-of-measure induced by the martingale 
(M^* : n > 0) introduced earlier. Note that 



The probability P x * (r (x ) > T (b)) decays exponentially fast as b /* oo by using a standard 
large deviations argument on T (b) jb and because r (xq) has exponentially decaying tails. 
Therefore we obtain (jlip . which in turn yields the conclusion of the theorem. ■ 



3 State-Independent Importance Sampling 

In order to design efficient estimators we will apply importance sampling. It is well known 
(see for example, Liu (2001)) that the zero-variance importance sampling distribution is dic- 
tated by the conditional distribution of the process (in this case the triplet {(S n , D n , X n )}), 
given the occurrence of the rare event in question. As we discussed in the previous section, the 
occurrence of the event 7a < oo is basically driven by the tail behavior of max{S n : n > 0}. 
In turn, the large deviations behavior of Markov random walks (such as S) is well under- 
stood from a simulation standpoint and, under our assumptions, there is a natural state- 
independent change-of-measure that can be shown to be efficient for estimating the tail of 
max{S n : n > 0}. 

The present section is organized as follows. First, we shall explain the change-of-measure 
that is efficient for estimating the tail of meix{S n : n > 0} because it will serve as the basis 
for our change-of-measure in the setting of perpetuities (but some modifications are crucial 
to guarantee good performance). After that, we shall show by means of an example that this 
type of importance sampling algorithm can lead to estimators that have infinite variance. 
We then close this section with a modified state-independent importance sampling algorithm 
that is strongly efficient but biased. 



E Xo exp (max{9Sk : 1 < k < r (xq)}) < oo 



(11) 



P X0 (max{S k : 1 < k < r (x )} > b) < P X0 {T (6) <r(x )) 




<cexp(-9*b)P°;(T(x )>T(b)). 
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3.1 The standard approach 



As we indicated in the proof of Theorem [[J given 9 and X = xo equation (TjJ indicates that 
the process 

M e n = exp(6S n -n^(6))^4 } n>0 

is a positive martingale as long as ip (9) < oo and therefore it generates a change-of-measure. 
The probability measure in path-space induced by this martingale is denoted by P xo (-) and 
in order to simulate the process ((D n , S n , X n ) : n > 0) according to P Xo (-) one proceeds as 
follows. 

1. Generate Xi according to the transition matrix 

K B (x, y) = K (x, y) exp (x (y, 9) - ip (9)) u e (y) /u e (x) , 
which is guaranteed to be a Markov transition matrix by the definition of ug (•) and 

2. Given X% = y, sample 7 (y, £1) according to exponential tilting given by 

P e (7 (y, 6) e dz) = exp (9z - x (y, 9)) p ( 7 ( y , d) e dz) . 

3. Simulate X(y,r]i) according to the nominal conditional distribution of rji given that 
X 1 = y and 7 

The previous rules allow to obtain (Di, Si, Xi) given (D , S , X ). Subsequent steps are 
performed in a completely analogous way. 

Note that if one selects 9 = 9* then we have that ip(9*) =0 and also S n /n — ► ip' (9*) > 
a.s. with respect to P®* (■). If a (y) = inf {n > : S n > y] then 

P xo (max{S n :n>0}>y)= P xo (a (y) < 00) 

= E e x ;il/M°yo(y)<oo] 



E x exp(-0»S ff(l ,); 



In the last equality we have used that a(y) < 00 a.s. with respect to P®* (•). The importance 
sampling estimator 

Z = exp{-9*S a(y) ) j- v , 

u e , \X a{y) ) 

obtained by sampling according to P x (•) is unbiased and its second moment satisfies 

u e. Oo) 



< cexp (—29*y) , 



"3 (^ (y) ) 
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for some constant c G (0, oo). Since, under our assumptions (see for example, Asmussen 
(2003), Theorems 5.2 and 5.3 on Page 365) P Xo (a (y) < oo) = 7*exp(— 6*y) (1 + o(l)) as 
y /* oo for a suitable 7* G (0, 00) we obtain that Z' y is strongly efficient for estimating 
p x (^(y) < 00) as y /■ 00. 



3.2 Standard importance sampling can lead to infinite variance 

A natural approach would be to apply directly the previous change-of-measure for estimating 
Ta < 00. Nevertheless, we will show by means of a simple continuous-time example that 
can be fit in the discrete time setting (through simple embedding) that such approach does 
not guarantee efficiency. In fact, the estimator might even have infinite variance. 

Example 3: Let (X (t) : t > 0) be Brownian motion with negative drift —[/, and unit vari- 
ance. We are concerned with 0(A) = P{ f °° exp (X (t)) dt > 1/A). In particular, here 
we have ip (9) = t^ 1 log-Eexp (9X (t)) = —fx$ + 9 2 /2 (in the discrete time setting tp (6) 
will be — /jl9 + 9 2 /2 multiplied by the discretization time scale). In order to analyze the 
second moment of the natural estimator described previously we need to evaluate 9* so 
that ip (9*) = 0. This yields 9* = 2/i and the resulting importance sampling algorithm 
proceeds by simulating X (•) according to a Brownian motion with positive drift /j, and 
unit variance up to time Ta = inf{t > : f* exp (X (s)) ds > 1/A} and returning the 
estimator = exp (— 9*X (Ta)). The second moment of the estimator is then given 
by E e *[Z\;T& < 00] = E[Za',Ta < 00], by a change of measure back to the original 
one (here we have used E e * (■) to denote probability measure under which X (•) follows 
a Brownian motion with drift fi and unit variance). We will show that if 9* > 1 then 
£"[exp (— 9*X (T A ))| T A < 00] = 00. By the definition of 9* we will conclude that if /i > 1/2 
then 

E [Z A ; T A < 00] = E[exp (-0*X (T A ))| T A < oo]P (T A < 00) = 00 

which implies infinite variance of the estimator. In order to prove this we will use a re- 
sult of Pollack and Siegmund (1985) (see also Dufresne (1990)) which yields that D = 
J °° exp (X (t)) dt is equal in distribution to 1/Z, where Z is distributed gamma with a den- 
sity of the form 

exp (-Xz) X^z 9 *- 1 
h(z)- ^ , 

for some A > 0. In particular, a transformation of variable gives the density of D as 

My)-. eXP( ~ A/ ^ 



r(W 



,+1 



and hence 



P {D > 1/A) = U r (W .« * = — Sjw + / 1/a * 
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where the second inequality follows from integration by parts. Note that 

Ji/A 0.r(w +2 y - W*) y l/A y «.+2 y 0,(0, + i)r(0,) 171 ] 

as A \ 0. Hence P (D > 1/A) ~ cA 9 * where c = exp(-AA)A e */(W^)). Now, let W be 
a random variable equal in law to A[D — 1/A] given that D > 1/A . Note that conditional 
on T A < oo, we have 

POO 

D= exp(X(t))dt 
Jo 

/■Ta /-oo 

= / exp(X(t))dt + / exp(X(t))dt 
Jo Jt a 

= i + exp(X(T A )) /"°° exp(X(t) - X(T A ))dt 
= l + exp(X(T A )) J D / 

where Z?' has the same distribution as 1/Z and is independent of exp (X (T A )) given that 
T A < oo. We have used the strong Markov property and stationary increment property of 
Brownian motion in the fourth equality. Hence the random variable X (T A ) satisfies the 
equality in distribution 

W= d Aex V (X(T A ))D', 

Now, it is clear that E[(D')' e *] = EZ d * < oo, so it suffices to show that if 9* > 1 then 
E (W~ e *) = oo. Using the definition of W , transformation of variable gives 

fw H = r(0*)p^r>i/ A ) exp ( " A/(w/A + 1/A)) {w/A + VA)"^^. 

Therefore, we have that there exists a constant c G (0, oo) such that 

POO 

E (W~ e *) = f w H w- 9 "dw 
Jo 

poo 

> c / A" 1 exp {-\/{w/A + 1/A)) (w/A + l/A)-( e * +1 ) (Aw)'"* dw 

Jo 

> c / exp (-AA) 2-^ +1) w~ e *dw = oo. 

Jo 



The problem behind the natural importance sampling estimator is that one would like the 
difference [St a — log(l/A)] to stay positive, but unfortunately, this cannot be guaranteed 
and in fact, this difference will likely be negative. The idea that we shall develop in the next 
subsection is to apply importance sampling just long enough to induce the rare event. 
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3.3 A modified algorithm 



We select Q = Q* and simulate the process according to the procedure described in Steps 1 
to 3 explained in Section 3.1 up to time 

Ta/o. = inf{n > : D n > a}, 

for some a G (0,1). Subsequent steps of the process {(Sk, Dk, Xk)}, for k > Ta/o, are 
simulated under the nominal (original) dynamics up until Ta- The resulting estimator takes 
the form 

Z 1A = exp (-9.S TA/a ) U6 : {X0) , /(Ax> > 1). (12) 



u e* ( X T A/a ) 



We will discuss the problem of implementing this estimator in a moment, in particular the 
problem of sampling I{D ao > 1) in finite time. First we examine its efficiency properties. 
We assume for simplicity that the rewards are bounded, we will discuss how to relax this 
assumption right after the proof of this result. 



Theorem 2 In addition to Assumptions 1 and 2, suppose that there is deterministic constant 
m G (0, oo) such that A (x,rf) < m. Then, Z\ t A is a strongly efficient estimator of <ft (A). 

Proof. It is clear that Z^a is unbiased. Now, define 

U n = exp (—S n ) {D n - a} /A 
for n > 1. The process {U n } will be helpful to study the overshoot at time Ta/o.- Note that 
U n +i = A (X n+1 , 7] n+ i) + exp (-7 (X n+1 , Cn+i)) U n , (13) 
and also that we can write Ta/o, = inf{n > : U n > 0}. 

It is important to observe that 

Doo = a + exp (Sr A/o ) AU TA/a + exp (s Ta/o ) D'^, (14) 

where D'^ is conditionally independent of Sr A/a , Ur A/a given Xr A/a - In addition, D'^ is 
obtained from the original / nominal distribution. Decomposition (fill) implies that 

Ipoo > 1) < /(exp (s TA/a ) D' OQ >(l- a)/2) + /(exp (s TA/a ) AU TA/a > (1 - a)/2), 

and therefore, by conditioning on (S n , D n , X n ) for n < Ta/o, we obtain that the second 
moment of is bounded by 



^,o,„)N (~^S TA/a ) UdAXo) , 0(o,o,x Ta/J (exp (s TA/a ) 2A/(1 - a))] (15) 

ue* (XT A/a ) 

+£(OA*o)[ ex P (-2^ Ta/ .) ^ (X0) /(exp (5r A/- ) A^ A/a > (1 - a)/2)] . (16) 
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We will denote by I\ the term in (|15p and by I 2 the term in ([TBI) . It suffices to show that 
both Ji and I2 are of order O (A 26 *). 

Theorem [1] guarantees the existence of a constant c\ G (1, 00) so that 

0(o,o,-o) ( A ) < ciexp (e*S T ^ A"V(1 - a) e *. 
Using this bound inside (115]) we obtain that 

A9 * me, r_ f no c \ u e, ( x o) 



1 J WO. (*T A/a 



(17) 



for some constant mi > and thus, since 

£(Vo)t ex P (-^r A/a ) ^ (X0) J = (A) = O (A**) (18) 

u e , \^T A/a J 

we conclude that I\ = O (A 2 "*). 

We now study the term I 2 . Just as in the proof of Markov's inequality, note that for any 
> 

< ^ (r^)^(o,o,o)[ ex P (-2^r A/ .) exp (#7 Ta/ .) 7^ 0) v < 7 J- (19) 

We could pick, for instance, (3 = 39^/2 and use the fact that ug f ^-Xr A/o J > 8 for some 5 > 
to obtain that 

^„)N (-2^^T A/a ) exp (jJSr^) Ue * {Xo) Ulj (20) 

uo. [X TA/a J 

£ ^ (^,i«p (-".*.,.)]) 1/2 a^W- 

If we are able to show that 

^,o„o)(<J=°( 1 ) ( 21 ) 

as A \ 0, then we an conclude, owing to ( ITS]) , that the right hand side of ( 120]) is of order 
O (A 9 */ 2 ). Thus, comb ining this bound on ([20]) . together with ( f!9]) we would conclude that 
I2 = O (A 261 *) as required. It suffices then to verify (|2T]) . however, this is immediate since 

under our current assumptions we clearly have that Ur A/a < A \ XT A/a , VT A/a 1 < m. ■ 



We shall comment on two important issues behind this result. First, we have assumed that 
the rewards are bounded in order to simplify our analysis. Note that the only place that 
used this assumption is in establishing ([2T]) . It is possible to estimate the expectation in ([21]) 
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only under Assumption 3 using a Lyapunov bound similar to the one that we will discuss in 
Lemma [TJ 

Second, the estimator Z^a is unbiased only if we can generate in a finite time. Generating 
unbiased samples from under our current assumptions is not straightforward (see for 
example Diaconis and Freedman (1999) on issues related to steady-state distributions for 
iterated random functions, and Blanchet and Sigman (2011) for algorithms that can be used 
to sample under assumptions close to the ones that we impose here). Alternatively, one 
might recognize that is the steady-state distribution of a suitably defined Markov chain. 
In the presence of enough regeneration structure, one can replace the indicator in (JT2"|) by an 
estimator for the tail of based on the corresponding regenerative ratio representation. 
Note that this replacement would involve a routine simulation problem as there is no need to 
estimate any rare event. However, once again after using a regenerative-ratio based estimator 
one introduces bias. 

We shall not pursue more discussion on any of the two issues raised given that the class 
of estimators that we shall discuss in the next section are not only unbiased but are also 
asymptotically optimal as A — > and can be rigorously shown to have a running time that 
grows at most logarithmically in 1/A. 



4 State-Dependent Importance Sampling 

An issue that was left open in the previous section was that the estimator that we constructed 
is biased from a practical standpoint. In this section, we illustrate how to construct an 
efficient importance sampling estimator that terminates in finite time and is unbiased. The 
estimator based on applying state-independent importance sampling up until time Ta has 
been seen to be inefficient. Examples of changes-of- measure that look reasonable from a large 
deviations perspective but at the end turn out to have a poor performance are well known 
in the rare-event simulation literature (see Glasserman and Kou (1995)). It is interesting 
that estimating the tail of provides yet another such example. These types of examples 
have motivated the development of the theory behind the design of efficient state-dependent 
importance sampling estimators, which is the basis behind the construction of our estimator 
here. We shall explain some of the elements behind this theory next. 

We will follow the approach based on Lyapunov inequalities (see Blanchet and Glynn (2008) 
and Blanchet, Glynn and Liu (2007)). Let us introduce some notation for W n = (S n , D n , X n ). 
The transition kernel associated to W is denoted by Q (■), so 



A state-dependent importance sampling distribution for W is described by the Markov tran- 
sition kernel 



P W0 (Wi 6 A) = P ( Wx e A\ W = w ) = [ Q (w , dw) 



J A 




(22) 
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where r (•) is a positive function properly normalized so that 

J Q r (w ,dwi) = J r (wq.wiY 1 Q(w Q ,dwi) = I. 

The idea behind the Lyapunov method is to introduce a parametric family of changes-of- 
measure. As we shall see, in our case, this will correspond to suitably defined exponential 
changes-of- measure. This selection specifies r (•). The associated importance sampling esti- 
mator, which is obtained by sampling transitions from Q r {■), takes the form 

Z A = r (W , W ± ) r (W 1 , W 2 ) ...r (W Ta . u W Ta ) I (T A < oo) . 

Using Pw ^ (•) (resp. E^ (•)) to denote the probability measure (resp. the expectation oper- 
ator) induced by the transition kernel Q r (-) given that Wo = w, we can express the second 
moment of Z via 

v A (w) = E^Z 2 A = E W Z A . 
Note that conditioning on the first transition of the process W one obtains 

v A M = E w [r (w,W 1 )v A (W 1 )\, 

subject to the boundary condition v A (w) = 1 for tosRxjl, oo) x S. We are interested in a 
suitable upper bound for v A (w), which can be obtained by taking advantage of the following 
inequality proved in Blanchet and Glynn (2008). 

Lemma 1 If h A (•) is non-negative and satisfies 

E w [r (w, W x ) h A (Wi)] < h A (w) (23) 
subject to h A (w) > 1 for w G M x [1, oo) x S, then 

va (w) < h A (w) . 



Our strategy in state-dependent importance sampling is aligned with the intuition behind 
the failure of the natural state-independent importance sampling strategy described in the 
previous section; it consists in applying importance sampling only when it is "safe" to apply 
it. In other words, we wish to induce T A < oo by exponentially tilting the increments of S, 
but we want to be careful and maintain the likelihood ratio appropriately controlled. So, 
for instance, cases where D n might be close to the boundary value 1, but S n is significantly 
smaller than log(l/A) are of concern. In those cases, we shall turn off importance sampling 
to avoid the accumulation of a large likelihood ratio in Z A . In summary, suppose that the 
current position of the cumulative discount rate process S is given by s and that the position 
of the discounted process D is d. We shall continue applying exponential tilting as long as 
(s, d, x) belongs to some region C where it is safe to apply importance sampling. We do not 
apply importance sampling if (s, d, x) C. The precise definition of the set C will be given 
momentarily. 
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Using the notation introduced earlier leading to the statement of our Lyapunov inequality in 
Lemma [I] we can describe the sampler as follows. Let C be an appropriately defined subset 
of R x R x <S. Assume that the current state of the process W is wo = (sq, d , Xq) and let us 
write W\ = (si, d\, y) for a given outcome of the next transition. The function r (•) = (•) 
introduced in (1221) takes the form 

r e« ((so,d ,x ), (s u di,y)) (24) 

= I ((s Q ,do,x Q ) G C) Ue *^ exp (g»(sx - s )) 

+ /((s ,rf ,x ) £ C). 



The construction of an appropriate Lyapunov function h A (•) involves applying Lemma [TJ In 
turn, the definition of the set C is coupled with the construction of h A (•). We shall construct 
TtA (') so that (s,c?, x) = 1 implies (s,d,x) ^ C. Moreover, we shall impose the condition 
h A (•) G [0, 1]. Assuming h A can be constructed in this way we immediately have that the 
Lyapunov inequality is satisfied outside C. We then need to construct h& on C. We wish to 
find an asymptotically optimal change-of- measure, so it makes sense to propose 

h A (a, d, x) = 0(P (sAx) (T A < oo) 2 - pA ), 

where pA \ as A \ (recall the definition of asymptotic optimality given in the Intro- 
duction). On the other hand, we have that 

P(s,d,x) (Ta < oo) = P( ,o,x) (d + exp (s) AD^ > 1) 
= p (o,o,x) (d^ > exp (s) 

^exp(s^)[A/(l-d)f*. (25) 

Motivated by the form of this approximation, which is expected to hold at least in logarithmic 
sense as exp(— s)[(l — d)/A] — > oo, we suggest a Lyapunov function of the form 

h A (s, d, x) = mm{c 2 ^- pA exp ([29, - p A ]s) [A/(l - d) + ] 2e *^u 9t (x) u ^ PA (x) , 1}. 

The introduction of the function uq^ [x] ue»~p A (x) in a multiplicative form as given above 
is convenient for the purpose of verifying Lyapunov inequalities for importance sampling in 
the setting of Markov random walks (see Blanchet, Leder and Glynn (2009)). The constant 
ca > 0, which will be specified in the verification of the Lyapunov inequality, is introduced 
as an extra degree of freedom to recognize that approximation (|25|) may not be exact. The 
exponent on top of c A allows to make the estimates in the verification of the Lyapunov 
inequality somewhat cleaner. Note that we have h A (•) G [0, 1] and the set C is defined via 

C= {(s,d,x) : h A {s,d,x) < 1}. (26) 

We do not apply importance sampling whenever we reach a state (s, d, x) satisfying h A (s, d, x) = 
1. 
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We shall close this section with a precise description of our state-dependent algorithm. The 
following procedure generates one sample of our estimator. 

State-Dependent Algorithm 

Step 1: Set p A = l/log(l/A) and c A = (B 2 / B 1 )p^ 1+1/{2e *~ PA)) with < B h B 2 < oo as 
indicated in Proposition [1] below. Initialize (s,d,x) <— (0,0, x ). 

Step 2: Initialize likelihood ratio L <— 1. 

Step 3: While at (s, d, x), do the following: 

1. If (s, cf, x) G C defined in (126|) i.e. h&(s,d,x) < 1, 
i. Generate Xi from the kernel 



u 9t (x)' 



Say we have realization X\—y. 

ii. Given Xl = sample 7(2/, £1) from the exponential tilting 

P d *(l(y,Zi) e <fc) =exp(^-x(y,^))P(7(y,6) e tfe). 

Say we have 7(2/, £1) = 2;. 

iii. Sample A(y, 771) from the nominal distribution of 771 given Xi = y and 7 (y, £1) = 
z. 

Say we have \(y,r)i) = w. 

iv. Update 

t r 1 a \ u oX x ) 

L 4— L x exp(-0*£) — H"- 

Else if (s,d,x) ^ C i.e. h^(s,d,x) = 1, 

i. Sample Xi from its nominal distribution. Given X\ = y, sample 7(2/, £1) and 
X(y, f]i) from their nominal distributions. Say the realizations are 7(2/, £1) = y 
and A(y, 771) = w. 

2. Update 

(s, d, x) <— (s + z, d + Aw exp(s + z), y). 

3. If d > 1, output L and stop; else repeat the loop. 



The variance analysis of the unbiased estimator L as well as the termination time of the 
algorithm are given in the next sections. 
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5 Efficiency of State-Dependent Importance Sampling 



In order to verify asymptotic optimality of L we first show that Ha (•) satisfies the Lyapunov 
inequality given in Lemma [TJ We have indicated that the inequality is satisfied outside C. 
On the other hand, one clearly has that Ha (s,d,x) = 1 for d > 1, so the boundary condition 
given in Lemma [T] is satisfied. Consequently, in order to show that Ha (•) is a valid Lyapunov 
function and that v a (w) < Ha (w) we just have to show the following proposition. 

Proposition 1 Suppose that Assumptions 1 to 3 are in force and select bo < oo such that 
< l/[mi e€{0 ^) xeS u e (x)} 2 < b . 

i) Select b\ > such that for each 5 G (0, #*) 

exp(^ (6,-5)) < 1 - 5/i + M 2 

where \i = dip(9*)/d9 > 0. 
iij Picfc 62 G (0, 00) snc/i t/iat 

sup E x .[A(X 1 ,r ?1 ) 2e *- <5 exp((^ - 5) 7 (X 1 ,^ 1 ))] < b 2 

xg5,<5g(0,6»*) 

and mate iae following selection of B\, B 2 , Pa ond ca-' 

111) Select < B u B 2 < 00 and p A , c A > so inai p A \ 0, p A G (0, 0*), c A = (B 2 /Bi)^ (1+1/(2 **" pa)) , 
-E?iPa < 1 and 

&0&2PA (1 - p A /i + Qip A ) < 1 



Then, Ha (•) satisfies the Lyapunov inequality §2&\l on C assuming that r (■) = (■) is azven 
as zn p^p . 

Proof. First, 60 is finite because ug t (-) is strictly positive and S is finite. The fact that 
the selections in i) and iii) are always possible follows from straightforward Taylor series 
developments. The selection of b 2 in ii) is possible because of Assumptions 2 and 3 combined 
with Holder's inequality. We shall prove the concluding statement in the result using the 
selected values in i), ii) and iii). Assume that (s,d, x) is such that Ha (s,d,x) < 1. To ease 
the notation we write 71 = 7 (X±, £1) and Ai = A (Xi, rji). We need to show that 

E x h A (s + 71, d + exp (s + 7l ) AAi, X x ) L e , [X x , 71] (27) 
< h A (s,d,x) , 

where 

t iv 1 u eA x ) , a \ 
LeA x ^li\ = 7TTT ex P(-^7l)- 

U8, Al ) 
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We divide the expectation in (|27[) into two parts, namely, transitions that lie in a region that 
corresponds to the complement of C and transitions that lie within C. To be precise, set 
ft a £ (0, 1) and put 

A = {exp ( 7l ) Ai > ft A exp (s) (1 - d)/A} 

and write A c for the complement of A. The expectation in (1271) is then equal to J\ + J2, 
where 

Ji = J & c (ft A + 7i ! d + exp(,s + 7i) AAi,Xi)L e jXi,7i]; A), 
J2 = E x (h A (s + lu d + exp(s + ll ) AAi,Xi)L fl jXi,7i];A c ). 
We first analyze Ji/ftA (s,ft\ x). Note that 

Ji < ^(L e jX 1)7l ];A) 



h/\(s,d,x) h A (s,d,x) 

< ^g, (g) ^x[exp (-^71) ; A] 

~~ inf^s «0» (y) «a (s, d, a;) 

Now we note that 

£ x [exp (-0»7i) ; A] 

= ^(exp (-0*7i) ; exp (-71) < A X A exp(s)/[(l - d)a A ]) 
A \ * exp(#*s) 



< 



1 - d j 

x ^ [A?*; A x > exp(- 7l - s)a A {l - d)/A] . 
Moreover, applying Markov's inequality we obtain that for each > 

^ [A?*; Ax A exp ( 7l + s)/[o A (l - d)} > l] 

Af +/3 exp(/3 7 i)" • 



< A^exp^) ^ 



[o A (l-d)F 
Selecting (3 = 9* — p A we obtain (using ii)) 

Ji < ( A V^ A exp((2^-pA)s)^(x)^[Af*- pA exi)((fA ; -/> A )7,)] 



h A (s,d,x) \l-dj a 2 l* PA mf yeS U0* (y) h A (s,d,x) 

< &2 < Mo 



(ftAC A ) 2e * PA infj, e5 u e . (3/) inf y6 5 ttfl.-^ (y) (a A c A ) 2e * PA 
To analyze J2 we note that on 

A c = {exp( 7 i + s)AiA/(l - d) < ft A } 

we have that 

h A (s + 71, d + exp (s + 71) AAi, X x ) 



/?-a (s, ft\ x) 
exp((20* - Pa)7i) I 1 



Aexp( g + 7l )A i y (2e - PA) up. (X 1 )u e ^ PA (XQ 

1 - d J Ug t (x) Uo t - pA (x) 

< exp((20, - PA )7i) (1 - a A y^'^ ^ TO 

(x) Me,-p A (x) 
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Therefore, we have that 
J2 

h (s, d, x) 

< (1 - a A )-<*-^E x ( exp((26>, - p A ) 7l ) ^ ( f l] . Ue *~ p * ^ L,.[X U 71 

= (1 - a A )-^-^)^ (exp((0, - PA)li) Ue "- pA ^ \ 

\ u e ,- PA (x) J 

= (l-a A )-( 2e *-^exp(^ (0,-p A )). 
Recall that we have assumed in ii) that b\ is selected so that 

exp (if) (0* - p A )) < 1 - paP + hp\. 

Thus we obtain 

— < (1 - a A )-^'^\l - pap + b lP 2 A ). 



h (s, d, x r 

Combining our estimates for J\ and J 2 together we arrive at 

J\ J2 



h(s,d,x) h(s,d,x) 

b b 2 (1 -p A p + V A ) 

~ (a A CA) 2e *- pA (1 - a A )W*-^ ■ 

Let a a = B\p A and c A = (B 2 j 5i)p^ f or p A < substitute in the previous 

inequality and conclude that 

h J2 



h (s, d, x) h (s, d, x) 

< b b 2 p A (1 - PaP + 6iPa) < 
_ B2 e,- PA + (1 _ BlpA )(2^- PA ) - > 

where the previous inequality follows by the selection of B\ and -B2 in Assumption iii). This 
concludes the proof of the proposition. ■ 

The next result summarizes the asymptotic optimality properties of the algorithm obtained 
out of the previous development. 

Theorem 3 Select p A = l/log(l/A) and c A = (B 2 /B 1 )p~ il+1/{26 ^ PA)) with < B u B 2 < 
00 as indicated in Proposition [B Then, the resulting estimator L obtained by the State- 
dependent Algorithm has a coefficient of variation of order 0(c 2 £*) and therefore, in partic- 
ular, it is asymptotically optimal. 

Proof. The result follows as an immediate consequence of the fact that h A is a valid 
Lyapunov function combined with Theorem [TJ ■ 
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6 Unbiasedness and Logarithmic Running Time of State- 
Dependent Sampler 



Throughout the rest of our development, in addition to Assumptions 1 to 3, we impose the 
following mild technical assumption. 

Assumption 4: For each x G S, Var (7 (x, £)) > 0. 

The previous assumption simply says that 7 (x, £) is random. The assumption is immediately 
satisfied (given Assumption 2) in the i.i.d. case. As we shall explain a major component in 
our algorithm is the construction of a specific path that leads to termination. Assumption 
4 is imposed in order to rule out a cyclic type behavior under the importance sampling 
distribution. 

We shall show that the state-dependent algorithm stops with probability one (and hence 
avoids artificial termination which causes bias, a potential problem with the algorithm in 
Section 3.3) and that the expected termination time is of order (log(l/A)) p for some p > 0. 
To do so let us introduce some convenient notation. Let Z n = (1 — D n )e~ Sn , and put 
Y n = (X n , Z n ) for n > 0. The dynamics of the process Y = (Y n : n > 0) are such that 

Y n+1 = (X n+1 ,Z n e-^ x ^^ - AX(X n+1 , Vn+1 )). 

It is then easy to see that our state-dependent algorithm, which mathematically is described 
by equations (l24"l) and (1261) . can be stated in the following way in terms of Y n : Given Y n , 

• Apply exponential tilting to 7pf„ + i, £ n +i) (using the tilting parameter #*) if 



• Transition according to the nominal distribution of the system if 

< Z n < c^ueSXnf/^-^ue^AXn) 11 ^-^- 

• Terminate if Z n < 0. 

Note that the region when Z n > CAAu e ,(A n ) 1/(2e * _PA) n e ,_ PA (A n ) 1/(2e * _PA:) corresponds to the 
region C (recall equation flU in Section 4). If < Z n < ca Au^ {X n ) l ^ 2e * ~ PA ^ue,- PA (X n ) 1 /( 28 * ~p&) 
we say that Y n is in C . Finally, we say that Y n is in B if Z n < 0. In other words, the set 
B is the termination set. Let us write m = \ni x& s{ueX x ) l ^ 2e *~ p ^ u e l -p A { x ) 1 ^ 2e * PA ' > } and 
M = sup^iue^xy/^-P^ue^-p^x) 1 ^ 29 *-^}. Note that < m < M < 00. A key ob- 
servation is that the set C is bounded. This will help in providing bounds for the running 
time of the algorithm as we shall see. 
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We will obtain an upper bound for the algorithm by bounding the time spent by the process 
in both regions C and C . Intuitively, starting from an initial position in C, the process 
moves to C in finite number of steps. Then the process moves to either C or B. If the 
process enters C before B, then from C it again moves back to C and the iteration between 
region C and C repeats until the process finally hits B, which is guaranteed to happen 
by geometric trial argument. Our proof below will make the intuition rigorous and shows 
that the time for the process to travel each back-and-forth between C and C is logarithmic 
in 1/A, and that there is a significant probability that the process starting from C hits 
B before C. This, overall, will imply a logarithmic running time of the algorithm. More 
precisely, we will show the following lemmas. The reader should keep in mind the selections 

Pa = l/log(l/A) and c A = (B./B^p^ 1 ^'^ = O (log(l/A) 1+1 /(^)) 
given in Theorem [3j 

Recall the notations P 6 *(-) and E e *[-] to denote the probability measure and expectation 
under the state-dependent importance sampler. Note that under P 9 *{-) no exponential tilting 
is performed when the current state Y n lies in C . 

Lemma 2 Denote Tcub = inf {n > : Y n G C U B}. Under Assumptions 1 to 4 we have 

E e y ;[T CuB }=0(d } A \ogc A ) 
uniformly over y G C for some constant p > 0. 

Lemma 3 Let Tq = inf{n > : Y n G C} and Tb = inf{n > : Y n G B}. If Assumptions 1 
to 4 hold, then, uniformly over y = (x ,z Q ) G C , 

P^(T B <T C )>^- 
for some constant c\ and p (the p can be chosen as the same p in LemmalE). 

Lemma 4 Denote Tcub = inf{n > : Y n G C U B} and suppose that Assumptions 1 to 4 
are in force. For any y = (x , Zq) G C , we have Py*(TcuB < oo) = 1 and 

i=°( log te)) 



The first lemma shows that it takes on average a logarithmic number of steps (in 1/A) 
for the process to reach either B or C from C . The second lemma shows that there is a 
significant probability, uniformly over the initial positions in C, that the process reaches B 
before C. The third lemma states that the time taken from C to C is also logarithmic in 
1/A. Lemmas [2] and H] guarantee that each cross-border travel, either from C to C or from 
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C to C, takes on average logarithmic time. On the other hand, Lemma [3] guarantees that a 
geometric number of iteration, with significant probability of success, will bring the process 
to B from some state in C . These will prove the following proposition on the algorithmic 
running time. 

Proposition 2 Suppose that Assumptions 1 to 4 hold. Then, for any yo = {xq, zq), we have 



for some p > 0. 



We now give the proofs of the lemmas and Proposition [2J 

Proof of Lemma [2]. Our strategy to prove Lemma [2] is the following. Given any initial 
state y = (x , z ) G C, we first construct explicitly a path that takes y to B within 
O(logCA) steps (which we call event A(xq) below), and we argue that this path happens 
with probability ^( c a P ) for some p > 0. Then we look at the process in blocks of O(logCA) 
steps. For each block, if the process follows the particular path that leads to B, then T B , 
and hence Tbuo> is hit; otherwise the process may have hit C, or may continue to the next 
block starting with some state in C . In other words, T B uc is bounded by the time from 
the initial position up to the time that the process finishes following exactly the particular 
path in a block. We note that a successful follow of the particular path is a geometric r.v. 
with parameter 0(c^ p ), and hence the mean of T BuC is bounded by 0(c A p ) x O(logCA) and 
therefore the result. Now we make the previous intuition rigorous. 

We first prove some elementary probabilistic bounds for and A(-). As in Section 4 we 
simplify our notation by writing 7„ = 7(X n ,£ n ) and A n = \(X n ,£ n ). We first argue that for 
any y = (x, z) G C, 



for some < u\ < 1. Note that the initial conditioning in the probability in (128]) depends 
on y only through x. 

We prove ( 1281) by contradiction. Suppose ( l28j) is not true, then there exists some Markov 
state w such that 



P e *(T B <oo) = l and 





(28) 




Now if this happens and additionally 
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which obviously implies 



then 



e 



< 1, 



which contradicts the definition of 8*. Hence we are left with the possibility that 



Pi* 



but this contradicts our non-degeneracy assumption, namely, Assumption 4. 
Using (1281) . note that we can pick u 2 > small enough such that 

for any y = (x, z) G C . This follows from a contradiction proof since the non-existence of 
u 2 would imply e -71 =0 a.s. 

On the other hand, it is easy to see that there exists r\ and r 2 and a small enough 113 > 
such that 

P£ (X, = r 2 , A(r 4 , Vl ) > u 3 ) > e 2 > (30) 

since otherwise Ai = a.s. 

We will now construct the path A(xq) as discussed earlier in the proof. This path will 
depend on the initial position y = (xq, zq) G C, but it has length O(logCA) uniformly 
over any initial position in C . The path has the property that whenever the Markov state 
hits 7*1, it would go to r 2 with X(r 2 ,r] n ) > u 3 in the next state. Moreover, for every step, 
e^" /( - Xn '^ug t (X n _ 1 ) 1 ^ e * /u 6t (X n ) l l e * < ui and e _7n > u 2 . The path evolves in a periodic way 
i.e. it hits rx in every / steps for N times, where N is a number to be determined later. 
The existence of I and the occurrence of such periodic cycles with positive probability is 
guaranteed by the irreducibility of the Markov chain X n . In other words, consider the event 
A{xq) given by 

A{x Q ) = {*** = !,...,*, X a+kl = n, X a+kl+1 = r 2 , A(r 2 , r) a+k i+i) > u 3 ; 
for i = 1, . . . , a + Nl + b, e^*M ^*"lT ^ u ^ e ~ M) * u ^ 

Xa+Nl+b — X(\ 

where a is the number of steps for the initial state Xq to reach n and b is the number of steps 
for the last hit on r 2 back to state x . Note that a and b all depend on x , but we suppress 
the dependence for notational convenience. N is an integer that we will pick momentarily. 
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Under A(x ) we have 



Z a+Nl+b = ze -^--^+™+» - AA ie -^--^+^+ 6 - AX 2 e-" /s --^+ Nl + b AX a+Nl+b 

< zu <f m + h - Au 3 (u a 2 +Nl + u a 2 +(N ~ 1)l + ••• + <) 

, _ (N+l)l 

< zu a+Nl +b _ Au3U aJ__^ 

1 - U 2 

Now pick N to be the smallest integer at least as large as 

log(K +b Mc A A(l - u 2 ) + Au 3 u a 2 +l )/(Au 3 u a 2 )) 
l\og{l/u x ) 

This implies that 

log((^o< +fe (l-n 2 ) + A M 3< + V(A^<)) 
nog(l/ Ml ) 

(note the definition of C and M above) and a simple verification reveals that Z a+ Ni+b < on 
A{xq). Hence if A(xq) occurs, then Tb is hit before step a + Nl + b. Note that N = O(logCA). 

Now note that given y = (x , z ) G C, the probability that A(xo) happens is larger than 
qa+Ni+b £ Qr some g > o. If we divide the steps of the chain into blocks of size r + Nl, where 
r = max ie 5{a(i) + b(x)}, then the number of blocks required for Z n to hit (and hence 
Tbuc is achieved) is bounded by a geometric r.v. with parameter q r+Nl . Taking also into 
account the length of the blocks, we have 

EftTBuc < -^wj(r + Nl) = 0(^ A \ogc A ) 

for some p > 0. ■ 



Proof of Lemma [31 Given an initial position yo = (xq, zq) G C. It suffices to show that 
the path A(xq) we have constructed in the proof of Lemma [2] does not hit Tc before Tb i.e. 
it does not hit Tc for every step up through a + Nl + b. The conclusion of Lemma [3] then 
follows by noting that Py*(Ts > Tc) > Py*(A(x )). To prove Tc is not hit for every step, 
we show that Z n < c A Au e ,(X n ) 1 /( 2e *-' ,A )u ,_ PA (X n ) 1 /( 2e *-^) i.e. Z n e B U C, for every 
n = 1,..., a + NI + b by induction. Supposed < c A Au B XXn) ll(2d "~ PA) UB,-p A (X n ) 1 ^ 2d ''- p ^\ 
then 



Z n +i — Z n e 7n+1 — AA n+ i 



ueMn) 1 ^ 

<c A A^(X n+1 ) 1/(2e '-^ ) ^_ PA (X„ +1 ) 1/(29 *-^ ) 

for small enough A, where U\ is defined in f l2"H|) . by choosing the eigenvectors v,0 t _s(x) that 
are continuous in 5 within a small neighborhood of uniformly over all x E S. Hence we 
have proved our claim. ■ 
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Proof of Lemma 31 Suppose we start at y = (x , z ) G C. Consider 7„ = Y^i= T _i 7» 
where r n = inf{i > r n _i : Xj = x }- Inside region C the random walk S n = YTj=\ Ij nas 
positive drift i.e. E^ n > 0. For the process to hit C U B, it suffices to have 

^ e _ 7l _..._ 7n _ AAie - 7 2— ~ 7 n AA„ < mc A A 

or equivalently 

A(A n + mc A A)e 7l+ '" +7 " + AA ri _ 1 e 7l+ - +7 - 1 + • • • + AAie 71 > z 

This will be implied by the condition A(A„ + mcAA)e 7lH 1-7,1 > Zq, which in turn can be 
achieved if S n > log(zo / (mc a A)) . Note that if we only consider the steps when X n = x , 
then the condition becomes S n > log(2 /(mcAA)) where S n is now a positively drifted 
random walk. This happens with probability one and the expected time for this to happen 
is O ( log ( m( f° A ) ) , which provides an upper bound for Ey* [Tcub] ■ ■ 



Proof of Proposition Consider an initial position at yo = (xq,Zq) G C (if yo = 
(xq,zo) G C the same analysis goes through resulting in a shorter mean running time). 
With probability one Y n will enter C U B by Lemma HI If Tc> < Tb then by Lemma [3] 
the process hits B with probability that is bounded away from zero uniformly over n 
otherwise it goes back to C. Hence by geometric trial argument the process will hit B 
eventually. We obtain the first part of the proposition. 

We now consider E^'Tb- Suppose first that yo = (xo,z ) G C . Write 

E%T B = E e y l [T fl ; T b < T c ] + E e y ; [T B ; T B > T c ) 

Let T B = T B — Tc on the set T B > Tc i.e. T B is the residual time to hit B once Tc is first 
hit. We can write 

E ll TB = E Io T buc + E e y ; [T B ; T b > T c ] = E e y ;T BuC + E e y ; [E^ (j/o) T B ; T B > T c ] 

where Yr c (yo) is the state at time Tc (the yo as a parameter emphasizes the dependence on 
the initial position yo). We further write 

E y* T B = E y * T BuC + E y * [E^ {yo) [f BuC i + [T B ; f B > f c >}};T B > T c ] (31) 

where T B = T B — Tc on the set T B > Tc ■ 
Let f(y) = E y * T B . dSTJ) leads to 

f(y ) < Ey* T Bu c + E e y ;[E e * {yo) f Bu c;T B > T c ] + sup f(w)P^(T B > T c ) 



<K'Jb^c + cEI; 



f Y Tc (yo) 



+ sup f{w)P 9 y ;{T B > T c ) (32) 

wee 
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where c > is a constant, using Lemma HI Now consider 

Y Tc (y Q ) = z Q e-^—-^ T c - AA ie - 72 -- 7T c AA Tc < z e~ 7l ~""" TT c 

and hence 

lnp - I 

mc^A 



'Y T (?/q) 

lo S ( ~ a ) < lc, g - 7i 1T C ~ log(mc A A) 



Now 



[-7i 7t c ; T B > T C \ < <* [| 7 i| + • • • + |7t suC | ; T B > T c ] 

<<1I7i| + --- + |7t suC |] 



yo i 



< ceL:t bvjc 



where c = sup x&s E e *[\'j(X 1 ,t;i)\\X 1 = x] < oo, by Wald's identity and Assumption 1 in 
Section 2. This gives 



E yl 



< iogz„p£ (t b > T c )+cE e -T mc -\ og {mc^)p"-{T B > t c 



(33) 

Putting (1331) into (1321) and using the fact that zq < McaA for y = (^o, z o) £ C" yields 



/(2/o) < <*T BUC + cP* (T B > T c ) log(Mc A A) + ccE e y ;T BuC - c\og{mc A A)P e y ;{T B > T ( 



'-cj 



+ mvf(w)P%(T B >T l 



c) 

weC 

Now taking supremum on both sides and using Lemma [2] and [31 we get 

sup /O) < O (c^logCA + log ( — - - ) -=-) 
wee V \caAJ (f A J 

Suppose we start with y e C, then combining with Lemma H] concludes the proposition. 



7 Extensions and Remarks 

As noted in equation (jSJ) the perpetuity -D satisfies the distributional fixed point equation 

D = d B + AD' , 

where A = exp(F), D' has the same distribution as D, and (A, B) is independent of D' . 
We wish to compare our development to that of Collamore et al (2011), which state their 
results in the absence of Markov modulation, so to make the comparison more transparent 
we will omit the Markov chain {X n } from our discussion here. We have assumed that B is 
non-negative but we believe that this is not a strong assumption. We can typically reduce to 
the case of non-negative B. Indeed, if the B^s can take negative values, we can let Bi = \Bi\ 
and define 

D = B 1 + exp(Y 1 )B 2 + .... (34) 
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Note that the tail of D is in great generality equivalent (up to a constant) to that of D. 
Since {D > 1/A} C {D > 1/A} we can use the likelihood ratio constructed to estimate 
the tail of D but apply it to the event I (D > 1/A). The same efficiency analysis applies 
automatically. So, throughout our discussion we shall keep assuming that B is non-negative. 

Now, one could consider more general fixed point equations, for instance, 

D = d B + A max (D', C) (35) 

where D' has the same distribution as D, (A, B, C) are independent of D' . This equation 
is the focus of Collamore et al (2011). Their assumptions are similar to ours, primarily 
that 9* > satisfying EA e * = 1 can be computed; that suitable moment conditions are 
satisfied for B and C, and that the associated exponentially tilted distributions can be 
simulated. Their estimator for the tail of D is biased, but it enjoys asymptotic optimality 
properties parallel to strong efficiency. In this sense this estimator is close in spirit to our 
state-independent importance sampler. However, their construction is completely different 
to ours, as we shall explain now. 

Equation fl35|) characterizes the steady-state distribution (if it exists) of the Markov chain, 
{V n : n > 0}, defined via Vb = v and 

V n+ i = B n+1 + A n+1 max (V n , C n+1 ) , (36) 

where {(A n , B n , C n ) : n > 1} is an i.i.d. sequence. Collamore et al (2011) uses a regenerative 
ratio representation for the steady-state distribution of {V n }, assuming suitable minorization 
conditions required for regeneration are in place. Clearly, there are advantages to simulating 
V n (which is Markovian) as opposed to the "backward" process, which corresponds to the 
discounted reward process (whose limit is the perpetuity and it is not Markovian but requires 
keeping track of S n — Y\ + ... + Y n ). Nevertheless, one of the important features of impor- 
tance sampling is that it can be applied to estimate conditional expectations of sample path 
functions given the even of interest (in this case D > 1/A). This is also why we wanted our 
algorithms to be developed under the presence of Markovian modulation, without resorting 
to a decomposition such as ()5]). 

This feature, we believe, is quite attractive specially in some of the applications behind our 
motivation to study discounted process, such as insurance and finance. The problem with 
using the "forward" representation (i.e. {V n } and the associated regenerative ratio) is that, 
while the tail estimation of D is preserved, it is difficult to use the associated algorithms for 
estimation of conditional sample path expectations. 

Finally, we point out that a similar coupling idea to the one used in the construction of 
flMj) can be applied to reduce the analysis of fl35|) to the case of standard perpetuities. In 
particular, define B n+ i = B n+ i + A n+ iC n+ i, and plug this definition into to define D. 
Then we have that D > D, where D is the limit of the "backward" representation associated 
to (I3"5"|) . Since D and D are typically tail equivalent (except for a constant), again we can 
proceed as indicated earlier. Because {D > 1/A} C {D > 1/A} we can use the likelihood 
ratio constructed to estimate the tail of D but apply it to the event I (D > 1/A). Again, 
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bias is introduced because of the infinite horizon nature of D, but the rare-event simulation 
problem has been removed by the importance sampling strategy constructed based on D. 



8 Numerical Experiments 

We run our algorithm for the ARCH(l) sequence in Example 1 with ao = 1, 2 and 
a.i = 3/4, 4/5. In this example there is no Markov modulation. Using the transforma- 
tion into T n as shown in Example 1, the tail probability of the steady-state distribution of 
the ARCH(l) process with target level 1/A is equivalent to the tail probability of a perpe- 
tuity with \(Xi,r]i) = a , 7pQ,£j) = log«i + logx? where Xi are i-i-d- chi-square r.v.'s, and 
target level cui/A. One can compute easily that 

and hence verify that Assumptions 1, 2 and 3 are satisfied. Moreover, the conditions in 
Proposition [1] are also satisfied with appropriate selection of parameters (see the discussion 
below). Our choices of a± would correspond to 9* with values 1.68, 1.46 and 1.34 respectively. 
This implies a tail of the steady-state ARCH(l) model that has finite third moment but not 
the fourth, which frequently arises in the financial context (see, for example, Mikosch and 
Starica (2000)). 

We test the performance of both our state-independent and state-dependent sampler pro- 
posed in Sections 3 and 4 by comparing with crude Monte Carlo. To gauge the performance 
of our algorithms as A becomes small, we tune A from 0.1 to 0.00001 to see the effect of the 
magnitude of A to the output performance. 

For crude Monte Carlo, we truncate the maximum number of steps to be 1000 (so that 
the sequence does not iterate indefinitely; note that this would certainly cause bias in the 
sample) . 

In the case of the state-independent importance sampler, we use a = 9/10 and = 
101og(l/A) (where a is the proportion of the barrier that upon touching would lead to 
the stop of importance sampling and n* is the number of steps we continue to simulate after 
2a/ ). 

For the state-dependent sampler, we can verify that b Q = 1, b x = sup 0< ^ <9f (ip" (() + 
(ip' 2 (C)))/2 and b 2 = max{a!o *, 1} satisfy the conditions in Proposition 2. To ensure that 
the Lyapunov inequality holds for small A one can choose B\ and B2 in Proposition 2 to 
satisfy B 2 > 1 and 

ti - 25A - h -ir > 

B 2 * 

In particular we can choose B 1 = O.45///(20,) and B 2 = max{(6 6 2 /(0.45/i)) 1 / e *, 1}. 
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For each set of input parameters (i.e. a , ot\ and A) we simulate using crude Monte Carlo, 
the state-independent sampler, and the state-dependent sampler. For each method we fix 
the running time to be five minutes for comparison. In the following tables we show the 
estimate, empirical coefficient of variation (standard deviation divided by the estimate), and 
95% confidence interval for our simulation. Tables are deferred to the appendix below. 

Next we also run the algorithm for a Markov modulated perpetuity. The modulating Markov 
chain lies in state space {1,2} and has transition matrix 

"i i ■ 

K = 22 
[10 

We use A(l,77i) = A(l) = 1 and A(2, 771) = A(2) = 2 a.s.. Also, 7(1,6) = M 2 / 3 ) + lo gxf 
and 7(2,6) = log(3/4) + logx 2 , where again xf & re i i-d. chi-square random variables. 

Again we experiment using crude Monte Carlo and both state-independent and dependent 
importance samplers. Similar to the ARCH(l) setup, a simple calculation reveals that 
e x(i,e) = ( 2 x (2/3)) 29 r(# + l/2)/r(l/2) and = (2 x (3/4)) 2e r(# + l/2)/r(l/2). 

M oreover, = (e x ^ + Ve 2 ^) + 8exM+x(m) / A, so 9* = 1.60. We take u e (l) = 

(yf&m + 8e x(i,e)+x(2,e) + e x(i,*))/(4 e x(i,e)) and u e {2) = 1. 

The computational effort needed for this Markov-modulated problem appears to be sub- 
stantially heavier than the case of ARCH model, and hence we perform a longer and more 
extensive simulation study. We tune A from 0.1 to 0.002, and for each scenario we run 
the simulation for one hour for each method. For crude Monte Carlo, we use 100, 000 
as our step truncation. For state-independent importance sampler we use a = 9/10 and 
n* = 1000 log(l/A). For state-dependent importance sampler, the setup according to Propo- 
sition 1 is as follows. First, for every 9 e (0,$*), we use a normalization of ug(-) such that 
min^s ug(x) = 1. This choice of ug(-) satisfies continuity in 9 in a small neighborhood 
(#* — Pa, 9*) (regarding ug(-) as a vector- valued function). This is because the largest eigen- 
value of the matrix Qg is isolated; a small perturbation of 9 leads to a small change in the 
angle of the associated eigenspace, and hence the eigenvector taken as the direction of the 
eigenspace is continuous in 9. Consequently, the minimum taken over the components of 
the eigenvector is also continuous in 9, and so is the eigenvector in the particular normal- 
ization that we use. As a result, we can choose b — 1. For other parameters, we then use 
h = sup Ce(0A) (V"(C) + 4>' 2 (C))/2, and b 2 = max{su Pa;&s \(x) 29 % 1} sup xeS E x e^ 9 *\ Then 
taking B 1 = 0.45/^/(26'*) and B 2 = max{(6 &2/(0.45/i)) 1/e % 1} will satisfy the Lyapunov 
inequality for small enough A. The numerical outputs are shown in the appendix below. 

For the ARCH model, it is notable from the coefficient of variation and confidence interval 
that both state-independent and state-dependent samplers perform better than crude Monte 
Carlo starting from A = 0.001. Crude Monte Carlo has much larger coefficient of variation 
when A is 0.0005, and it merely fails (i.e. does not generate any positive sample) when A 
is 0.00001. On the other hand, the coefficient of variation for state-independent sampler 
remains at around 1 to 2 and that for state-dependent sampler remains under 50 for all the 
cases we considered. The state-independent sampler appears to perform better than state- 
dependent sampler for our range of A, although one should keep in mind there is bias issue 
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in that algorithm. Similar results hold for the Markov-modulated perpetuity, where crude 
Monte Carlo fails completely when A is 0.005 or larger while the importance samplers still 
perform reasonably well. 

8.1 Appendix: Numerical Output 



ARCH model, parameter values: a — 1, a± — 3/4 

Crude Monte Carlo 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


6.65 x 10" 2 


3.75 


[6.43 x 10- 2 ,6.86 x 10~ 2 ] 


A 


= 0.05 


2.89 x 10~ 2 


5.80 


[2.74 x 10~ 2 ,3.04 x 10~ 2 ] 


A 


= 0.001 


1.11 x HT 4 


95.05 


[1.37 x 10~ 5 ,2.08 x 10~ 4 ] 


A 


= 0.0005 


2.11 x 10~ 5 


217.9 


[-2.02 x 10~ 5 ,6.24 x 10~ 5 ] 


A 


= 0.00001 





N/A 


N/A 



State-Independent Sampler 





Estimate 


C.V. 


95% C.I. 




A = 


0.1 


6.84 x 10~ 2 


1.79 


[6.82 x 10" 2 ,6.86 x 10" 


- 2 ] 


A = 


0.05 


2.84 x 10~ 2 


1.76 


[2.83 x 10- 2 ,2.85 x 10" 




A = 


0.001 


1.10 x HT 4 


1.75 


[1.09 x 10~ 4 , 1.10 x 10" 




A = 


0.0005 


4.01 x 10~ 5 


1.79 


[3.99 x 10~ 5 ,4.02 x 10" 




A = 


0.00001 


1.34 x 10~ 7 


1.72 


[1.33 x 10~ 7 ,1.35 x 10" 


- 7 ] 



State-Dependent Sampler 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


6.80 x 10" 2 


3.68 


[6.63 x 10" 2 ,6.96 x HT 2 ] 


A 


= 0.05 


2.82 x HT 2 


5.81 


[2.67 x 10~ 2 ,2.96 x KT 2 ] 


A 


= 0.001 


1.45 x 10" 4 


24.77 


[7.70 x 10~ 5 ,2.14 x KT 4 ] 


A 


= 0.0005 


4.22 x 10~ 5 


29.93 


[1.49 x 10~ 5 ,6.95 x KT 5 ] 


A 


= 0.00001 


2.48 x 10~ 7 


37.71 


[-5.43 x 10~ 8 ,5.49 x 10~ 7 ] 


ARCH model, parameter values: ao 


= 2, ai = 3/4 






Crude Monte Carlo 






Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


1.51 x lO" 1 


2.37 


[1.48 x 10- 1 , 1.54 x 10' 1 ] 


A 


= 0.05 


6.64 x 10~ 2 


3.75 


[6.40 x 10- 2 ,6.88 x 10" 2 ] 


A 


= 0.001 


2.61 x HT 4 


61.92 


[1.13 x 10" 4 ,4.08 x 10- 4 ] 


A 


= 0.0005 


8.19 x 10~ 5 


110.5 


[1.64 x 10" 6 ,1.62 x 10- 4 ] 


A 


= 0.00001 





N/A 


N/A 
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State-Independent Sampler 



Estimate C.V. 95% C.I. 



A 


= 0.1 


1.50 x 10" 1 


1.92 


[1.495 x lO -1 , 1.503 x 10" 1 ] 


A 


= 0.05 


6.85 x 10~ 2 


2.48 


[6.83 x 10- 2 ,6.88 x 10~ 2 ] 


A 


= 0.001 


3.00 x 10~ 4 


1.87 


[2.99 x 10~ 4 ,3.02 x 10~ 4 ] 


A 


= 0.0005 


1.09 x KT 4 


1.69 


[1.09 x 10~ 4 , 1.10 x lO" 4 ] 


A 


= 0.00001 


3.69 x 10- 7 


1.69 


[3.67 x lO" 7 , 3.71 x lO" 7 ] 






State-Dependent Sampler 






Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


1.50 x lO" 1 


2.38 


[1.46 x 10-\1.54 x 10" 1 ] 


A 


= 0.05 


6.92 x 10~ 2 


3.66 


[6.57 x 10^ 2 ,7.27 x 10~ 2 ] 


A 


= 0.001 


5.54 x 10~ 4 


42.46 


[-2.14 x lO -4 , 1.32 x lO -3 ] 


A 


= 0.0005 


1.61 x 10~ 5 


42.72 


[-8.85 x HT 6 ,4.11 x 10~ 5 ] 


A 


= 0.00001 


9.13 x 10~ 9 


34.65 


[-8.78 x 10~ 9 ,2.70 x lO" 8 ] 



ARCH model, parameter values: cto = 1, ai — 4/5 



Crude Monte Carlo 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


7.79 x 10" 2 


3.44 


[7.45 x 10- 2 ,8.14 x HT 2 ] 


A 


= 0.05 


3.53 x KT 2 


5.23 


[3.29 x 10~ 2 ,3.76 x KT 2 ] 


A 


= 0.001 


2.27 x KT 4 


66.39 


[2.80 x 10~ 5 ,4.26 x 10~ 4 ] 


A 


= 0.0005 


9.13 x 10~ 5 


104.6 


[-3.52 x 10- 5 ,2.18 x 10" 4 ] 


A 


= 0.00001 





N/A 


N/A 



State-Independent Sampler 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


7.78 x 10" 2 


1.72 


[7.75 x lO" 2 , 7.81 x 10~ 2 ] 


A 


= 0.05 


3.43 x 10~ 2 


1.56 


[3.41 x 10- 2 ,3.44 x 10~ 2 ] 


A 


= 0.001 


2.02 x 10" 4 


1.57 


[2.01 x 10~ 4 ,2.03 x 10~ 4 ] 


A 


= 0.0005 


8.00 x 10~ 5 


1.53 


[7.96 x 10~ 5 ,8.05 x 10~ 5 ] 


A 


= 0.00001 


4.21 x 10~ 7 


1.55 


[4.18 x 10~ 7 ,4.24 x 10~ 7 ] 






State-Dependent Sampler 






Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


7.84 x 10" 2 


3.39 


[7.62 x 10- 2 ,8.07 x 10~ 2 ] 


A 


= 0.05 


3.47 x 10~ 2 


5.21 


[3.27 x 10- 2 ,3.67 x 10~ 2 ] 


A 


= 0.001 


1.31 x KT 4 


27.97 


[4.61 x 10~ 5 ,2.16 x 10~ 4 ] 


A 


= 0.0005 


6.52 x 10~ 5 


23.15 


[2.71 x 10- 5 ,1.03 x 10~ 4 ] 


A 


= 0.00001 


8.09 x 10~ 7 


26.51 


[-6.15 x 10" 9 ,1.62 x 10~ 6 ] 
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ARCH model, parameter values: a = 2, a± — 4/5 

Crude Monte Carlo 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


1.59 x 10- 1 


2.30 


[1.55 x lO -1 , 1.64 x 10- 1 ] 


A 


= 0.05 


7.96 x 10~ 2 


3.40 


[7.62 x 10- 2 ,8.30 x lO" 2 ] 


A 


= 0.001 


4.49 x 10" 4 


47.20 


[1.71 x 10- 4 ,7.27 x lO" 4 ] 


A 


= 0.0005 


2.69 x 10~ 4 


60.92 


[5.39 x 10~ 5 ,4.85 x 10~ 4 ] 


A 


= 0.00001 





N/A 


N/A 



State-Independent Sampler 





Estimate 


C.V. 


95% C.I. 




A = 


0.1 


1.62 x lO" 1 


1.67 


[1.61 x 10~ 


_1 , 1.62 x 10- 


- 1 ] 


A = 


0.05 


7.74 x 10~ 2 


1.57 


[7.71 x 10" 


" 2 ,7.77 x 10" 


- 2 ] 


A = 


0.001 


5.12 x HT 4 


1.57 


[5.10 x 10" 


" 4 ,5.15 x 10" 




A = 


0.0005 


2.03 x 10" 4 


1.74 


[2.01 x 10- 


" 4 ,2.04 x 10" 




A = 


0.00001 


1.07 x KT 6 


1.59 


[1.06 x 10" 


" 6 , 1.08 x 10- 


- 6 ] 



State-Dependent Sampler 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


1.61 x KT 1 


2.28 


[1.56 x 10 _1 ,1.65 x KT 1 ] 


A 


= 0.05 


7.77 x 10~ 2 


3.44 


[7.33 x 10- 2 ,8.22 x 10" 2 ] 


A 


= 0.001 


4.32 x 10~ 4 


44.24 


[-2.64 x 10" 4 ,1.13 x 10" 3 ] 


A 


= 0.0005 


1.39 x KT 4 


23.65 


[6.72 x 10- 6 ,2.71 x lO" 4 ] 


A 


= 0.00001 


5.93 x lO" 7 


32.41 


[-5.70 x 10~ 7 , 1.76 x 10~ 6 ] 



Markov-modulated perpetuity 

Crude Monte Carlo 





Estimate 


C.V. 


95% C.I. 


A 


= 0.1 


7.23 x 10~ 2 


3.58 


[5.64 x HT 2 ,8.82 x 10~ 2 ] 


A 


= 0.05 


2.61 x 10~ 2 


6.12 


[1.60 x 10~ 2 ,3.62 x 10~ 2 ] 


A 


= 0.02 


3.09 x 10~ 3 


17.98 


[-4.07 x 10- 4 ,6.58 x 10~ 3 ] 


A 


= 0.005 





N/A 


N/A 


A 


= 0.002 





N/A 


N/A 
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State-Independent Sampler 





Estimate 


C.V. 


95% C.I. 




A = 


0.1 


5.82 x 10" 2 


45.89 


[5.55 x 10- 2 , 6.10 x 10" 


- 2 ] 


A = 


0.05 


2.08 x 10- 2 


10.45 


[2.04 x 10- 2 , 2.11 x 10" 


- 2 ] 


A = 


0.02 


5.24 x 10~ 3 


14.67 


[5.13 x 10~ 3 ,5.34 x 10- 


- 3 ] 


A = 


0.005 


5.92 x 10" 4 


11.25 


[5.73 x 10- 4 ,6.10 x 10" 


"I 


A = 


0.002 


1.37 x 10" 4 


9.25 


[1.34 x 10~ 4 , 1.40 x 10" 


"I 



State-Dependent Sampler 





Estimate 


C.V. 


95% C.I. 


A = 


0.1 


5.73 x 10" 2 


4.05 


[5.35 x 10- 2 ,6.10 x 10~ 2 ] 


A = 


0.05 


2.23 x 10~ 2 


6.62 


[1.88 x 10~ 2 ,2.58 x 10~ 2 ] 


A = 


0.02 


3.51 x 10~ 3 


16.83 


[1.74 x 10~ 3 ,5.27 x 10- 3 ] 


A = 


0.005 


4.40 x 10" 4 


47.68 


[-4.23 x 10~ 4 , 1.30 x 10- 3 ] 


A = 


0.002 


2.35 x 10~ 5 


44.40 


[-2.26 x 10~ 5 ,6.96 x 10~ 5 ] 
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